Dynamical Structures of High-Frequency Financial Data 
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We study the dynamical behavior of high-frequency data from the Korean Stock Price Index 
(KOSPI) using the movement of returns in Korean financial markets. The dynamical behavior for a 
binarized series of our models is not completely random. The conditional probability is numerically 
estimated from a return series of KOSPI tick data. Non-trivial probability structures can be consti- 
tuted from binary time series of autoregressive (AR), logit, and probit models, for which the Akaike 
Information Criterion shows a minimum value at the 15th order. From our results, we find that the 
value of the correct match ratio for the AR model is slightly larger than the findings of other models. 
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Recent investigation of differently scaled economic 
systems has been received a considerable attention as 
an interdisciplinary field of physicists and economists 
[1,2,3,4,5,6,7,8]. One of challenging issues is to test 
efficient market hypotheses from the perspective of em- 
pirical observations and theoretical considerations. To 
exploit or predict the dynamical behavior of continuous 
tick data for various financial assets [9, 10] is extremely 
desirable. Financial efficiency and predictability can sig- 
nificantly benefit investors or agents in the financial mar- 
ket and successfully reinforce the effective network be- 
tween them. For example, when the price of stock rises 
or falls in the stock market, a trader's decision to buy or 
sell is influenced by various strategies, external informa- 
tion, and other traders. One such strategy is to apply 
the up and down movement of returns to a correlation 
function and the conditional probability. This strategy, 
which is pivotal for predicting an investment, is a useful 
tool for understanding the stock transactions of company 
whose stock price is rising or falling. In the literature, 
Ohira et al. [9] mainly discussed conditional probability 
and the correct match ratio of high-frequency data for 
the yen-dollar exchange rate; they showed that such dy- 
namics is not completely random and that a probabilistic 
structure exists. Sazuka et al. [10] used the order k = 10 
of the Akaike Information Criterion (IC) to determine 
the predictable value of the autoregressive (AR) model; 
in contrast, they numerically calculated the 5th order of 
the logit model [11]. Motivated by such research, we ap- 
ply and analyze novelly the AR, logit, and probit models 
to the Korean financial market, which, in contrast to ac- 
tive and well-established financial markets, is now in a 
slightly unstable and risky state. 



Interest in nonlinear models has recently grown, par- 
ticularly in the social, natural, medical, and engineering 
sciences. Statistical and mathematical physics provides 
a powerful and rigorous tool for analyzing social data. 
Moreover, several papers have focused on social phenom- 
ena models based on aspects of stochastic analysis, such 
as the diffusion, master, Langevin, and Fokker-Planck 
equations. Many researchers in econometrics or biomet- 
rics have proposed the use of AR, logit, and probit mod- 
els in the formulation of the discrete choices, including 
binary analysis. Interestingly, Nakayama and Nakamura 
[16] associated the fashion phenomena of the bandwagon 
and snob effects with the logit model. To our knowledge, 
in addition to the Akaike IC, there are at least two other 
similar standards such as the Hannan-Quinn IC and the 
Schwarz IC. However, we restrict ourselves to find the 
Akaike IC as the residual test in order to minimize the 
remained value for binary analysis. Moreover, after cal- 
culating the binary structures and their Akaike IC value, 
we compute the correct match ratio, or the power of pre- 
dictability. Although the dynamical behavior of logit and 
probit models has been calculated and analyzed in scien- 
tific fields such as mathematics, economics, and econo- 
physics, until now these models have not been studied in 
detail with respect to financial markets 

In this letter, we present the future predictability func- 
tion of the AR, logit, and probit models, by using the tick 
data analysis of the Korean Stock Price Index (KOSPI) 
for the Korean financial market. By examining the bi- 
nary phenomena of a financial time series in terms of 
the nontrivial probability distribution, we show that the 
high-frequency data of our model follows a special con- 
ditional probability structure for the up and down move- 
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FIG. 1: Plot of the correlation function, C(u), from the set 
of minutely tick data, Data A, of the KOSPI; the data were 
collected from January 1997 to December 1998. 

ment of returns. Moreover, our results are of great impor- 
tance for making a powerful and capable tool that can be 
used to investigate properties of efficient and predictable 
markets. 

In our calculations, the return of the tick data at time 
t is R(t) = \np(t + At)/p(t) for the price p(t), and 
the return change is D(t) = R(t + 1) — R(t) for every 
time t. From the series of tick data in one asset, we 
can binarize the {X(t)} series as follows: X(t) = +1 if 
£>(*) > and X(t) = -1 if D(t) < 0. We can then 
extend the {X(t)} series to a random walk formalism as 
Z(t + 1) = Z(t) + X(t). Moreover, we can determine the 
cumulative probability distribution and the conditional 
probabilities from the random walk of the one-directional 
zigzag motion. The correlation function can also be cal- 
culated as 

C(u) =< D(t + u)D{t) > . (1) 

We now introduce the AR, logit, and probit mod- 
els [11, 12, 13, 14] for an {X(t)} scries of continuous tick 
data. The AR model is defined by 

k 

AR(fc) = a + ^ Q ^( i - l ) + e ( t )- ( 2 ) 

2 = 1 

where e(t) is a white noise with Gaussian distribution of 
zero mean and variance a. The standard logit model for 
binary analysis [12] is described as 

k 

logit(p) = log -2— = (3 + V faX(t - i) + e(t), (3) 

where p is a dummy variable between and 1 . The linear 
probit model from Eq. (3) is represented in terms of 

probit (p) = = z p (4) 



TABLE I: Values of conditional probability from the simula- 
tion results of Data A and Data B; NP stands for the number 
of tick data points. 
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where < 1 >_1 (-) is the inverse of the standard normal cu- 
mulative distribution function, and the standard normal 
cumulative distribution function is given by $(z p ) — 
Pr(z < z p ) = (l/V2^) Jl P oa dzexp(-z 2 /2). Further- 
more, we make use of Eqs. (2) — (4) to find out binary 
structure and its correct match ratio, and these math- 
ematical techniques lead us to more general results of 
predictability. To determine the minimized order k of 
our model, we define the Akaike IC [12, 13] as 

AIC = |; [- In Ml + In Mp] (5) 

for the sample size T, where Ml and Mp stand for the 
maximum likelihood and the number of parameters, re- 
spectively. 

To analyze the correlation function and the conditional 
probability, we introduce our underlying asset into the 
KOSPI in the Korean financial market. First, we consider 
two delivery periods: the first set of data, Data A, was 
from January 1997 to December 1998; the second set, 
Data B, was from January 2004 to December 2004. The 
lag time of two sets of tick data is about one minute. Data 
A contains 133, 823 items of data and Data B contains 
86, 561 items. 

From the two tick data, we computed two series: the 
X(t) series and the Z(t) series, where Z(t) represents a 
one-dimensional zigzag motion. This computation refers 
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FIG. 2: Conditional probabilities P(+\m = 3) for the set of 
minutely tick data, Data A, of the KOSPI. 
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FIG. 3: Plot of conditional probabilities P(+\m) and P(— \m) 
for the set of minutely tick data, Data A, of the KOSPI. 

to a binary strategy of the buy and sell trend of traders 
in financial markets. Fig. 1 plots the correlation function 
,C(u), which we obtained from the return change D(t). 
The plot suggests that the minutely returns for Data A of 
the KOSPI are not entirely independent of, or different 
from, the random walk model but almost independent 
for long periods. Given the probabilistic structure of our 
model, we can deduce from the correlation function that 
the dynamical behavior is completely nonrandom. 

By quantitative analysis, we can relate the X(t) series 
to conditional probability. To analyze the high-frequency 
data of the KOSPI, we concentrated on the up and down 
return movements in terms of conditional probability. 
The parameter P(+\ + , +) refers to the conditional prob- 
ability that a tend in the price returns is likely to move in 
the same direction; that is, that the price is likely to rise 
after two consecutive steps in the same direction. Table 
1 summarizes the results of various conditional proba- 



FIG. 4: Plot of the Akaike IC values for the AR model (the 
value of the left j/-axis) for Data A and of the logit model (the 
value of the right y-axis) for Data B; in each case, the Akaike 
IC value decreases gradually as the order of model grows. 



TABLE II: Values of the correct match ratio from the simu- 
lation results of Data A and Data B. 



KOSPI 


Data A 


Data B 


NP 


133, 823 


86,561 


AR model 


65.3% 


52.2% 


Probit model 


51.4% 


50.2% 


Logit model 


48.6% 


49.8% 



bilities for Data A and Data B of the KOSPI. Fig. 2 
shows that the conditional probability of P(+|+, +, +) 
has a remarkably larger value than the probability of 
P(+|m = 3), except P(+|+, +, +). From our results, 
we can give the relation of the three parameters as 
P(+|+,+) = P, P(+\-,+) = and P(+|+,+,+) = 
p + aforO<a<j3<l,0<i3 , <l. Figure 3 shows 
that the conditional probability P(+\m) (P(— |m)) has 
a larger value than P(+\m — 1) (P(— \m — 1)), which 
exists for one selling state or buying state after m — 1 
selling states or to — 1 buying states. When we com- 
pare this result to that of the yen-dollar exchange rate of 
the Japaneses financial market, our conditional probabil- 
ities for to < 5 have a slightly larger value than those of 
the yen-dollar exchange rate [9]. The values of P(+|to) 
and P(— |to) for to < 6 increase continuously while the 
two values for to > 6 are almost constant; in this case, 
the period of the to states is about m minutes in real 
times. We predict this result to be consistent with the 
buy-sell strategy of dealers who can change in a few min- 
utes. Note that although Data A and Data B share a 
significant similarity, we cannot understand the behavior 
of these data sets from a random walk model that has 
fixed values for conditional probabilities. 

For simplicity, we used the AR, logit, and probit mod- 
els to analyze the X(t) series for high-frequency tick data 



4 



of the Korean financial market. As shown in Fig. 4, we 
found that the Akaike IC values for the AR and logit 
models decrease gradually as the order of the models in- 
creases. Because the Akaike IC for the three models has 
approximately the same value in a range larger than the 
order of k = 15, we consider this value to be the mini- 
mum value; in addition, this value is similar to the 10th 
order of the AR model of the yen-dollar exchange rate 
[11]. Hence, the function shape of the logit model is sim- 
ilar to that of the probit model, and each probability 
structure tends to move continuously in the same direc- 
tion. By minimizing the Akaike IC value of our model, 
we were also able to calculate the correct match ratio. 
Table 2 shows the values of the correct match ratios for 
Data A and Data B. The AR model of Data A has a 
higher value than other models for the correct match ra- 
tio; in contrast, the logit model of Data B has a smaller 
value. 

In conclusion, we used the AR, logit, and probit models 
to determine the probability structure of high-frequency 
tick data of the KOSPI in the Korean financial market. 
The value of our conditional probability of the KOSPI 



is slightly greater than that of the yen-dollar exchange 
rate. Our results show that the Korean financial market 
is slightly unstable and less systematic than other finan- 
cial markets, though the results may be related to actual 
transactions of all assets. In addition, by using the AR, 
probit, and logit models, we deduce that the forecasted 
(or simulated) sign is equal to the sign of the actual re- 
turns. This deduction enables us to obtain the correct 
match ratio. Moreover, because the match ratio is al- 
ways greater than 0.5, we can conclude that our model 
has an improved forecasting capability. The AR model, 
which is expected to have a higher predictable value only 
in the Korean financial market, robustly supports the fu- 
ture predictability of price movement trends in financial 
markets. We also note that, with nonlinear models of 
data analysis, international finance theories can offer an 
enhanced interpretation of results. For the past decade, 
many econophysical investigations have led to greater ap- 
preciation of, and insight into, scale invariance and the 
universality of statistical approaches to physics and eco- 
nomics. Our results should encourage interdisciplinary 
research of physics and economics. 
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